home *** CD-ROM | disk | FTP | other *** search
- Some comments on my rewrite of Sendmail.mc .
-
- Neil Rickert,
- rickert@cs.niu.edu
- Thu, 7 Jun 90
-
- (Note: These are ex post facto comments, and do not necessarily reflect my
- thinking during the early part of the design phase.)
-
- 1. Data structures.
-
- It has long been my philosophy that the three most important ingredients
- of good programming are: Data structures, data structures, and data
- strucures. I think I came up with a pretty good data structure for
- representing addresses.
-
- You can essentially think of the internal address as being represented as
- an array, or more accurately as an array implementation of a pushdown
- stack. The bottom element of the stack is the 'user' identifier in the
- final destination domain. Otherwise each element is a pair consisting of
- a domain (the '@domain' portion) and an addressing format flag (the ',' or
- ':' or '!'). The majority of the .cf file deals with the top of the
- stack. For example, you can think of the PATHTABLE lookup as popping the
- top from the stack and then pushing the domains of its pathalias lookup
- value.
-
- The mailer specific rulesets, of course, must use the fact that the stack
- is implemented as an array, so that they can visit each entry and modify
- the flag.
-
- There is, of course, one more overall flag, which I implemented with the
- choice of either '@' or '%' as the leading character for the primary
- domain. Think of this as an overall flag to distinguish between sender
- and recipient addresses. This was added when I realized I was pushing the
- limit of the 30 allowable rulesets, and I didn't wish to require code
- changes if it could be avoided. It was pretty obvious that the mailer
- specific rulesets for sender and recipient addresses were usually very
- similar, so this flag idea makes sense. The flag idea could easily have
- been extended to also distinguish between header and envelope addresses,
- but I felt that would be overreaching for the present.
-
- 2. Logic.
-
- In my (admittedly biased) opinion, the internal logic is, in the main,
- simpler than in the original IDA. The original package was made overly
- complex by the continual reformatting from a bang path to a source route
- to %-path, and then possibly converting it back again. Part of my
- motivation in doing this rewrite was my frustration with the complexity of
- the original IDA. Almost every time I studied it closely I found another
- logic error, usually caused by making unwarranted assumptions in the
- process of conversion of an address between formats. It is fortunate that
- most of these logic errors rarely caused serious problems.
-
- It is much simpler to retain a consistent internal representation, and
- work mostly by changing the 'addressing format flag'.
-
- Most of the remaining complexity is because the internal representation as
- an array of domains must be crudely simulated in a tokenized character
- string. Except for the mailer specific rulesets, which need considerable
- flexibility, this would have been much easier to implement in code than in
- the replacement rulesets. A code version would probably be more reliable,
- too. Paul mentioned that Berkeley is working on a sendmail replacement.
- Perhaps they should look at this config, together with these comments, for
- one possible approach to address formatting.
-
- 3. Dealing with ambiguous addresses.
-
- I largely followed the interpretation of Lennart Lovestrand in the
- processing of input addresses. In particular in an address containing
- '@', '!' and '%', I gave '%' a higher precedence than '!' (unless the
- address originates at a UUCP source and STRICTLY822 is NOT defined).
-
- Thus the address 'c!u%b@a' was converted to the internal form:
- @a, @b: @c! u
- (where the spacing is for readability, and <> are omitted).
-
- It is entirely possible that on some occasions this is incorrect, and the
- address should have been interpreted as:
-
- @a, @c! @b: u
-
- I permitted sufficient residual ambiguity in many of the rulesets so that,
- when finally processed by ruleset 4, either of these internal forms would
- still finish up as 'c!u%b@a' in the final output address.
-
- 4. Ruleset #4.
-
- This ruleset is now in its fourth, and I hope final, rewrite. In each of
- my earlier versions I made some attempt to limit the degree of ambiguity
- allowed in the resultant address. But every approach had potential
- problems with eliminating some plausibly reasonable output address
- formats. I finally concluded that S4 should be totally general, and the
- reduction of ambiguity should be handled in the mailer specific rulesets.
-
- There is a down side to this complete generality. It is theoretically
- possible to present an internal format address to ruleset #4, such that
- the output is completely uninterpretable. I doubt that such addresses
- will arise naturally, and in any case the mailer specific rulesets are
- designed to eliminate most of the problems that could permit such
- addresses.
-
- About the only ambiguity I allowed to remain after the mailer specific
- rulesets is that between the precedence of '%' and '!' in a mixed address.
-
- I allowed some residual ambiguity there because, as I commented above, the
- original assumptions when first parsing the input address may have been
- incorrect.
-
- 5. Residual complexity.
-
- Most of the remaining complexity is in rulesets #4, #7, #9, and #19.
- Complexity is unavoidable in rulesets #4, #7 and #9, which deal with
- conversion between internal and external forms.
-
- The complexity of ruleset #19 is caused by the combination of two
- factors: my wish to retain full domain addresses as far as possible,
- replacing them by UUCP names only in ruleset #4; and my decision to fully
- follow the logic of the original IDA in using the UUCPXTABLE lookup as one
- criterion in the decision to continue conversion to ! formatting.
-
- Actually ruleset #19 (and its continuation in ruleset #20, or was it #21)
- was one of the places where there was a logic error in the original IDA.
- It never really used UUCPXTABLE the way the documentation claimed. I have
- a sneaking suspicion that all of the UUCPXTABLE dependent code in ruleset
- #19 should be eliminated, for the sake of a reduction in complexity. Now
- that the unnecessary conversion back and forth between '!' and '%' formats
- is eliminated, the original need for this code has probably disappeared.
-
-